R case study
Duke University
install.packages(c("tidyverse", "tidytext", "janeaustenr", "wordcloud2"))
I would like to take a moment to honor the land in Durham, NC. Duke University sits on the ancestral lands of the Shakori, Eno and Catawba people. This institution of higher education is built on land stolen from those peoples. These tribes were here before the colonizers arrived. Additionally this land has borne witness to over 400 years of the enslavement, torture, and systematic mistreatment of African people and their descendants. Recognizing this history is an honest attempt to breakout beyond persistent patterns of colonization and to rewrite the erasure of Indigenous and Black peoples. There is value in acknowledging the history of our occupied spaces and places. I hope we can glimpse an understanding of these histories by recognizing the origins of collective journeys.
Data cleaning & data wrangling
Tokenize corpora (unit of analysis)
Visualize word clouds (novelty)
Sentiment analysis
Analyzing word frequencies (tf-idf)
This is not a text analysis workshop. The foundations of text analysis require considerably more time that we have. This is a demonstration on leveraging tidy packages (tidyverse and tidytext) and sharing resources.
Text Mining with R
Each variable is a column
Each observation is a row
Each type of observational unit is a table
Tidy Data
tidytext::unnest_tokens()tm – Text Mining Infrastructure in R
quanteda – Package for managing and analyzing textual data
gutenbergr – public domain text from Project Gutenberg
Read more of Text Mining with R: A Tidy Approach
Summer Institute for Computational Social Science
co-founded by Chris Bail & Matthew Salganik